Preparing lessons: Improve knowledge distillation with better supervision
نویسندگان
چکیده
Knowledge distillation (KD) is widely applied in the training of efficient neural network. A compact model, which trained to mimic representation a cumbersome model for same task, generally obtains better performance compared with being ground truth label. Previous KD-based works mainly focus on two aspects: (1) designing various feature knowledge transfer; (2) introducing different mechanism such as progressive learning or adversarial learning. In this paper, we revisit standard KD and observe that teacher’s logits might suffer from incorrect uncertain supervision. To tackle these problems, propose novel approaches deal respectively, are called Logits Adjustment (LA) Dynamic Temperature Distillation (DTD). be specific, LA rectifies according label certain rules. While DTD treats temperature dynamic sample wise parameter rather than static global hyper-parameter, actually notes uncertainty each sample’s logits. With iteratively updating temperature, student could pay more attention samples confuse teacher model. Experiments CIFAR-10/100, CINIC-10 Tiny ImageNet verify proposed methods yield encouraging improvement KD. Furthermore, considering simple implementations, can easily attached many frameworks bring improvements without extra cost time computing resources.
منابع مشابه
Challenges for Better thesis supervision
Background: Conduction of thesis by the students is one of their major academic activities. Thesis quality and acquired experiences are highly dependent on the supervision. Our study is aimed at identifing the challenges in thesis supervision from both students and faculty members point of view. Methods : This study was conducted using individual in-depth interviews and Focus Group Discussi...
متن کاملApprentice: Using Knowledge Distillation Techniques to Improve Low-precision Net-
Deep learning networks have achieved state-of-the-art accuracies on computer vision workloads like image classification and object detection. The performant systems, however, typically involve big models with numerous parameters. Once trained, a challenging aspect for such top performing models is deployment on resource constrained inference systems — the models (often deep networks or wide net...
متن کاملApprentice: Using Knowledge Distillation Techniques to Improve Low-precision Net-
Deep learning networks have achieved state-of-the-art accuracies on computer vision workloads like image classification and object detection. The performant systems, however, typically involve big models with numerous parameters. Once trained, a challenging aspect for such top performing models is deployment on resource constrained inference systems — the models (often deep networks or wide net...
متن کاملApprentice: Using Knowledge Distillation Techniques to Improve Low-precision Net-
Deep learning networks have achieved state-of-the-art accuracies on computer vision workloads like image classification and object detection. The performant systems, however, typically involve big models with numerous parameters. Once trained, a challenging aspect for such top performing models is deployment on resource constrained inference systems — the models (often deep networks or wide net...
متن کاملTopic Distillation with Knowledge Agents
This is the second year that our group participates in TREC’s Web track. Our experiments focused on the Topic distillation task. Our main goal was to experiment with the Knowledge Agent (KA) technology [1], previously developed at our Lab, for this particular task. The knowledge agent approach was designed to enhance Web search results by utilizing domain knowledge. We first describe the generi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Neurocomputing
سال: 2021
ISSN: ['0925-2312', '1872-8286']
DOI: https://doi.org/10.1016/j.neucom.2021.04.102